Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
J Biomed Inform ; : 104649, 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38697494

ABSTRACT

OBJECTIVE: Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries. MATERIALS AND METHODS: C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics: readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire. RESULTS: Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively. CONCLUSION: GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.

2.
J Am Med Inform Assoc ; 31(5): 1062-1073, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38447587

ABSTRACT

BACKGROUND: Alzheimer's disease and related dementias (ADRD) affect over 55 million globally. Current clinical trials suffer from low recruitment rates, a challenge potentially addressable via natural language processing (NLP) technologies for researchers to effectively identify eligible clinical trial participants. OBJECTIVE: This study investigates the sociotechnical feasibility of NLP-driven tools for ADRD research prescreening and analyzes the tools' cognitive complexity's effect on usability to identify cognitive support strategies. METHODS: A randomized experiment was conducted with 60 clinical research staff using three prescreening tools (Criteria2Query, Informatics for Integrating Biology and the Bedside [i2b2], and Leaf). Cognitive task analysis was employed to analyze the usability of each tool using the Health Information Technology Usability Evaluation Scale. Data analysis involved calculating descriptive statistics, interrater agreement via intraclass correlation coefficient, cognitive complexity, and Generalized Estimating Equations models. RESULTS: Leaf scored highest for usability followed by Criteria2Query and i2b2. Cognitive complexity was found to be affected by age, computer literacy, and number of criteria, but was not significantly associated with usability. DISCUSSION: Adopting NLP for ADRD prescreening demands careful task delegation, comprehensive training, precise translation of eligibility criteria, and increased research accessibility. The study highlights the relevance of these factors in enhancing NLP-driven tools' usability and efficacy in clinical research prescreening. CONCLUSION: User-modifiable NLP-driven prescreening tools were favorably received, with system type, evaluation sequence, and user's computer literacy influencing usability more than cognitive complexity. The study emphasizes NLP's potential in improving recruitment for clinical trials, endorsing a mixed-methods approach for future system evaluation and enhancements.


Subject(s)
Alzheimer Disease , Medical Informatics , Humans , Natural Language Processing , Feasibility Studies , Eligibility Determination
3.
JAMIA Open ; 7(1): ooae021, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38455840

ABSTRACT

Objective: To automate scientific claim verification using PubMed abstracts. Materials and Methods: We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021. Results: In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. Conclusion: CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.

4.
Nat Commun ; 14(1): 7836, 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-38036523

ABSTRACT

African Americans have a significantly higher risk of developing chronic kidney disease, especially focal segmental glomerulosclerosis -, than European Americans. Two coding variants (G1 and G2) in the APOL1 gene play a major role in this disparity. While 13% of African Americans carry the high-risk recessive genotypes, only a fraction of these individuals develops FSGS or kidney failure, indicating the involvement of additional disease modifiers. Here, we show that the presence of the APOL1 p.N264K missense variant, when co-inherited with the G2 APOL1 risk allele, substantially reduces the penetrance of the G1G2 and G2G2 high-risk genotypes by rendering these genotypes low-risk. These results align with prior functional evidence showing that the p.N264K variant reduces the toxicity of the APOL1 high-risk alleles. These findings have important implications for our understanding of the mechanisms of APOL1-associated nephropathy, as well as for the clinical management of individuals with high-risk genotypes that include the G2 allele.


Subject(s)
Glomerulosclerosis, Focal Segmental , Humans , Glomerulosclerosis, Focal Segmental/genetics , Apolipoprotein L1/genetics , Genetic Predisposition to Disease , Risk Factors , Genotype , Apolipoproteins/genetics
5.
J Clin Transl Sci ; 7(1): e199, 2023.
Article in English | MEDLINE | ID: mdl-37830010

ABSTRACT

Background: Randomized clinical trials (RCT) are the foundation for medical advances, but participant recruitment remains a persistent barrier to their success. This retrospective data analysis aims to (1) identify clinical trial features associated with successful participant recruitment measured by accrual percentage and (2) compare the characteristics of the RCTs by assessing the most and least successful recruitment, which are indicated by varying thresholds of accrual percentage such as ≥ 90% vs ≤ 10%, ≥ 80% vs ≤ 20%, and ≥ 70% vs ≤ 30%. Methods: Data from the internal research registry at Columbia University Irving Medical Center and Aggregated Analysis of ClinicalTrials.gov were collected for 393 randomized interventional treatment studies closed to further enrollment. We compared two regularized linear regression and six tree-based machine learning models for accrual percentage (i.e., reported accrual to date divided by the target accrual) prediction. The outperforming model and Tree SHapley Additive exPlanations were used for feature importance analysis for participant recruitment. The identified features were compared between the two subgroups. Results: CatBoost regressor outperformed the others. Key features positively associated with recruitment success, as measured by accrual percentage, include government funding and compensation. Meanwhile, cancer research and non-conventional recruitment methods (e.g., websites) are negatively associated with recruitment success. Statistically significant subgroup differences (corrected p-value < .05) were found in 15 of the top 30 most important features. Conclusion: This multi-source retrospective study highlighted key features influencing RCT participant recruitment, offering actionable steps for improvement, including flexible recruitment infrastructure and appropriate participant compensation.

6.
medRxiv ; 2023 Aug 04.
Article in English | MEDLINE | ID: mdl-37577628

ABSTRACT

Black Americans have a significantly higher risk of developing chronic kidney disease (CKD), especially focal segmental glomerulosclerosis (FSGS), than European Americans. Two coding variants (G1 and G2) in the APOL1 gene play a major role in this disparity. While 13% of Black Americans carry the high-risk recessive genotypes, only a fraction of these individuals develops FSGS or kidney failure, indicating the involvement of additional disease modifiers. Here, we show that the presence of the APOL1 p.N264K missense variant, when co-inherited with the G2 APOL1 risk allele, substantially reduces the penetrance of the G1G2 and G2G2 high-risk genotypes by rendering these genotypes low-risk. These results align with prior functional evidence showing that the p.N264K variant reduces the toxicity of the APOL1 high-risk alleles. These findings have important implications for our understanding of the mechanisms of APOL1 -associated nephropathy, as well as for the clinical management of individuals with high-risk genotypes that include the G2 allele.

7.
AMIA Jt Summits Transl Sci Proc ; 2023: 281-290, 2023.
Article in English | MEDLINE | ID: mdl-37350899

ABSTRACT

Participant recruitment continues to be a challenge to the success of randomized controlled trials, resulting in increased costs, extended trial timelines and delayed treatment availability. Literature provides evidence that study design features (e.g., trial phase, study site involvement) and trial sponsor are significantly associated with recruitment success. Principal investigators oversee the conduct of clinical trials, including recruitment. Through a cross-sectional survey and a thematic analysis of free-text responses, we assessed the perceptions of sixteen principal investigators regarding success factors for participant recruitment. Study site involvement and funding source do not necessarily make recruitment easier or more challenging from the perspective of the principal investigators. The most commonly used recruitment strategies are also the most effort inefficient (e.g., in-person recruitment, reviewing the electronic medical records for prescreening). Finally, we recommended actionable steps, such as improving staff support and leveraging informatics-driven approaches, to allow clinical researchers to enhance participant recruitment.

8.
J Biomed Inform ; 142: 104375, 2023 06.
Article in English | MEDLINE | ID: mdl-37141977

ABSTRACT

OBJECTIVE: Feasible, safe, and inclusive eligibility criteria are crucial to successful clinical research recruitment. Existing expert-centered methods for eligibility criteria selection may not be representative of real-world populations. This paper presents a novel model called OPTEC (OPTimal Eligibility Criteria) based on the Multiple Attribute Decision Making method boosted by an efficient greedy algorithm. METHODS: It systematically identifies the optimal criteria combination for a given medical condition with the optimal tradeoff among feasibility, patient safety, and cohort diversity. The model offers flexibility in attribute configurations and generalizability to various clinical domains. The model was evaluated on two clinical domains (i.e., Alzheimer's disease and Neoplasm of pancreas) using two datasets (i.e., MIMIC-III dataset and NewYork-Presbyterian/Columbia University Irving Medical Center (NYP/CUIMC) database). RESULTS: We simulated the process of automatically optimizing eligibility criteria according to user-specified prioritization preferences and generated recommendations based on the top-ranked criteria combination accordingly (top 0.41-2.75%) with OPTEC. Harnessing the power of the model, we designed an interactive criteria recommendation system and conducted a case study with an experienced clinical researcher using the think-aloud protocol. CONCLUSIONS: The results demonstrated that OPTEC could be used to recommend feasible eligibility criteria combinations, and to provide actionable recommendations for clinical study designers to construct a feasible, safe, and diverse cohort definition during early study design.


Subject(s)
Algorithms , Research Design , Humans , Patient Selection , Eligibility Determination , Research Personnel
9.
Int J Med Inform ; 171: 104985, 2023 03.
Article in English | MEDLINE | ID: mdl-36638583

ABSTRACT

BACKGROUND: Participant recruitment is a barrier to successful clinical research. One strategy to improve recruitment is to conduct eligibility prescreening, a resource-intensive process where clinical research staff manually reviews electronic health records data to identify potentially eligible patients. Criteria2Query (C2Q) was developed to address this problem by capitalizing on natural language processing to generate queries to identify eligible participants from clinical databases semi-autonomously. OBJECTIVE: We examined the clinical research staff's perceived usability of C2Q for clinical research eligibility prescreening. METHODS: Twenty clinical research staff evaluated the usability of C2Q using a cognitive walkthrough with a think-aloud protocol and a Post-Study System Usability Questionnaire. On-screen activity and audio were recorded and transcribed. After every-five evaluators completed an evaluation, usability problems were rated by informatics experts and prioritized for system refinement. There were four iterations of system refinement based on the evaluation feedback. Guided by the Organizational Framework for Intuitive Human-computer Interaction, we performed a directed deductive content analysis of the verbatim transcriptions. RESULTS: Evaluators aged from 24 to 46 years old (33.8; SD: 7.32) demonstrated high computer literacy (6.36; SD:0.17); female (75 %), White (35 %), and clinical research coordinators (45 %). C2Q demonstrated high usability during the final cycle (2.26 out of 7 [lower scores are better], SD: 0.74). The number of unique usability issues decreased after each refinement. Fourteen subthemes emerged from three themes: seeking user goals, performing well-learned tasks, and determining what to do next. CONCLUSIONS: The cognitive walkthrough with a think-aloud protocol informed iterative system refinement and demonstrated the usability of C2Q by clinical research staff. Key recommendations for system development and implementation include improving system intuitiveness and overall user experience through comprehensive consideration of user needs and requirements for task completion.


Subject(s)
Natural Language Processing , User-Computer Interface , Humans , Female , Young Adult , Adult , Middle Aged , Computers , Electronic Health Records , Records
10.
J Am Med Inform Assoc ; 30(2): 256-272, 2023 01 18.
Article in English | MEDLINE | ID: mdl-36255273

ABSTRACT

OBJECTIVE: To identify and characterize clinical subgroups of hospitalized Coronavirus Disease 2019 (COVID-19) patients. MATERIALS AND METHODS: Electronic health records of hospitalized COVID-19 patients at NewYork-Presbyterian/Columbia University Irving Medical Center were temporally sequenced and transformed into patient vector representations using Paragraph Vector models. K-means clustering was performed to identify subgroups. RESULTS: A diverse cohort of 11 313 patients with COVID-19 and hospitalizations between March 2, 2020 and December 1, 2021 were identified; median [IQR] age: 61.2 [40.3-74.3]; 51.5% female. Twenty subgroups of hospitalized COVID-19 patients, labeled by increasing severity, were characterized by their demographics, conditions, outcomes, and severity (mild-moderate/severe/critical). Subgroup temporal patterns were characterized by the durations in each subgroup, transitions between subgroups, and the complete paths throughout the course of hospitalization. DISCUSSION: Several subgroups had mild-moderate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections but were hospitalized for underlying conditions (pregnancy, cardiovascular disease [CVD], etc.). Subgroup 7 included solid organ transplant recipients who mostly developed mild-moderate or severe disease. Subgroup 9 had a history of type-2 diabetes, kidney and CVD, and suffered the highest rates of heart failure (45.2%) and end-stage renal disease (80.6%). Subgroup 13 was the oldest (median: 82.7 years) and had mixed severity but high mortality (33.3%). Subgroup 17 had critical disease and the highest mortality (64.6%), with age (median: 68.1 years) being the only notable risk factor. Subgroups 18-20 had critical disease with high complication rates and long hospitalizations (median: 40+ days). All subgroups are detailed in the full text. A chord diagram depicts the most common transitions, and paths with the highest prevalence, longest hospitalizations, lowest and highest mortalities are presented. Understanding these subgroups and their pathways may aid clinicians in their decisions for better management and earlier intervention for patients.


Subject(s)
COVID-19 , Cardiovascular Diseases , Humans , Female , Middle Aged , Aged , Male , SARS-CoV-2 , Electronic Health Records , Hospitalization
11.
Stud Health Technol Inform ; 290: 309-313, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35673024

ABSTRACT

The rapid growth of clinical trials launched in recent years poses significant challenges for accurate and efficient trial search. Keyword-based clinical trial search engines require users to construct effective queries, which can be a difficult task given complex information needs. In this study, we present an interactive clinical trial search interface that retrieves trials similar to a target clinical trial. It enables user configuration of 13 clinical trial features and 4 metrics (Jaccard similarity, semantic-based similarity, temporal overlap and geographical distance) to measure pairwise trial similarities. Among 1,007 coronavirus disease 2019 (COVID-19) trials conducted in the United States, 91.9% were found to have similar trials with the similarity threshold being 0.85 and 43.8% were highly similar with the threshold 0.95. A simulation study using 3 groups of similar trials curated by COVID-19 clinical trial reviews demonstrates the precision and recall of the search interface.


Subject(s)
COVID-19 , Benchmarking , Data Collection , Humans , Search Engine , Semantics
12.
Stud Health Technol Inform ; 294: 392-396, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612103

ABSTRACT

Anecdotally, 38.5% of clinical outcome descriptions in randomized controlled trial publications contain complex text. Existing terminologies are insufficient to standardize outcomes and their measures, temporal attributes, quantitative metrics, and other attributes. In this study, we analyzed the semantic patterns in the outcome text in a sample of COVID-19 trials and presented a data-driven method for modeling outcomes. We conclude that a data-driven knowledge representation can benefit natural language processing of outcome text from published clinical studies.


Subject(s)
COVID-19 , Humans , Natural Language Processing , Semantics
13.
J Am Med Inform Assoc ; 29(7): 1161-1171, 2022 06 14.
Article in English | MEDLINE | ID: mdl-35426943

ABSTRACT

OBJECTIVE: To combine machine efficiency and human intelligence for converting complex clinical trial eligibility criteria text into cohort queries. MATERIALS AND METHODS: Criteria2Query (C2Q) 2.0 was developed to enable real-time user intervention for criteria selection and simplification, parsing error correction, and concept mapping. The accuracy, precision, recall, and F1 score of enhanced modules for negation scope detection, temporal and value normalization were evaluated using a previously curated gold standard, the annotated eligibility criteria of 1010 COVID-19 clinical trials. The usability and usefulness were evaluated by 10 research coordinators in a task-oriented usability evaluation using 5 Alzheimer's disease trials. Data were collected by user interaction logging, a demographic questionnaire, the Health Information Technology Usability Evaluation Scale (Health-ITUES), and a feature-specific questionnaire. RESULTS: The accuracies of negation scope detection, temporal and value normalization were 0.924, 0.916, and 0.966, respectively. C2Q 2.0 achieved a moderate usability score (3.84 out of 5) and a high learnability score (4.54 out of 5). On average, 9.9 modifications were made for a clinical study. Experienced researchers made more modifications than novice researchers. The most frequent modification was deletion (5.35 per study). Furthermore, the evaluators favored cohort queries resulting from modifications (score 4.1 out of 5) and the user engagement features (score 4.3 out of 5). DISCUSSION AND CONCLUSION: Features to engage domain experts and to overcome the limitations in automated machine output are shown to be useful and user-friendly. We concluded that human-computer collaboration is key to improving the adoption and user-friendliness of natural language processing.


Subject(s)
COVID-19 , Artificial Intelligence , Eligibility Determination/methods , Humans , Natural Language Processing , Patient Selection
14.
Stud Health Technol Inform ; 281: 984-988, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042820

ABSTRACT

Clinical trial eligibility criteria are important for selecting the right participants for clinical trials. However, they are often complex and not computable. This paper presents the participatory design of a human-computer collaboration method for criteria simplification that includes natural language processing followed by user-centered eligibility criteria simplification. A case study on the ARCADIA trial shows how criteria were simplified for structured database querying by clinical researchers and identifies rules for criteria simplification and concept normalization.


Subject(s)
Natural Language Processing , Research Personnel , Databases, Factual , Eligibility Determination , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...